Data Science: Visualization!)
Section 1: Introduction to Data Visualization and Distributions
You will get started with data visualization and distributions in R.
- understand the importance of data visualization for communicating data-driven findings.
- be able to use distributions to summarize data.
- be able to use the average and the standard deviation to understand the normal distribution.
- be able to assess how well a normal distribution fits the data using a quantile-quantile plot.
- be able to interpret data from a boxplot.
Section 2: Introduction to ggplot2
You will learn how to use the ggplot2 package to create plots.
Section 3: Summarizing with dplyr
You will learn how to summarize data using the dplyr package.
Section 4: Gapminder
You will see examples of ggplot2 and dplyr in action with the Gapminder dataset.
Section 5: Data Visualization Principles
You will learn general principles to guide you in developing effective data visualizations.
Section 1)
Data Types
Functions Overview:
numeric
Code From Video:
numeric
Key Points:
- Categorical data are variables that are defined by a small number of groups.
- Ordinal categorical data have an inherent order to the categories (mild/medium/hot, for example).
- Non-ordinal categorical data have no order to the categories.
- Numerical data take a variety of numeric values.
- Continuous variables can take any value.
- Discrete variables are limited to sets of specific values.
DataCamp Data Types)
Code:
table() #counts frequency
1.2 Intro to Distributions
DataCamp Assessment: Normal distribution)
Code:
library(dslabs)
data(heights)
x <- heights$height[heights$sex == "Male"]
mean(x>69 & x<=72) #What proportion of the data is between 69 and 72 inches (taller than 69 but shorter or equal to 72)? A proportion is between 0 and 1.